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Aligning the NW EA RIT Scale with the 
California Standards Test (CST) 

April, 2004 

John C ronin, Ph.D. - N orthwest Evaluation Association 

Each year, California students participate in testing as part of the state's assessment program. Students in 
grades 2 through 8 take tests that assess reading/writing skills and mathematics. These tests serve as an 
important measure of student achievement for the state's accountabi iity system. Results from these 
assessments are used to make state level decisions concerning education, to meet Adequate Yearly Progress 
(AYP) reporting requirements of theNo Child Left Behi nd Act (NCLB), and to inform schoolsand school 
districts of their performance. 

The California Department of Education has developed scales that are used to assign students to one of five 
performance levels on the state's assessments. These are, from the lowest cut score to the highest: far below 
basic, below basic, basic, proficient, and advanced. For purposes of NCLB, the proficient level is considered 
the level that represents satisfactory performance. 

M any students who attend school in California also take paper or computerized -adaptive tests developed in 
cooperation with the Northwest Evaluation Association (N WEA). These tests report student performance 
on a single, cross-grade scale, which NWEA calls the RIT scale. This scale was developed using Rasch 
scaling methodologies. RIT- based tests are used to inform a variety of educational decisions at the district, 
school, and classroom levd. They are also used to monitor academic growth of students and cohorts. 
Districts choose whether to include these assessments in their local assessment programs. They are not state 
mandated. 

The versions of NWEA tests in use in California have been specifically aligned to match the content of local 
and California state curriculum standards. Because of this, we believe there is a good match in content 
between the NWEA tests and the curriculum standards being used in California. 

In order to use the two testing systems to support each other, an alignment of the scores from the state and 
RIT-based tests is as important as the curriculum alignment. The current study is an expansion of a 
preliminary study of alignment of the California Standards Tests (CST) that was performed using data from 
one California school system in June2003. Itisoneof an ongoing series of studies that are being conducted 
to identify the relationships between NWEA tests and state- man dated assessments. Studies of assessments 
in sixteen states have now been completed. 

The primary questions addressed in this study are: 

• To what extent do the same subject scores for the NWEA test correlate to the content-similar 
subjects on the CST? 

• What fall and spring RIT scores correspond to various performance levels on the CST tests? 

• How well can proficient performance on the California assessments be predicted from fall and 
spring RIT scores? 
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Method 



Participating School Systems 

An email solicitation was sent in January, 2004 to ail California school systems who had two or more 
seasons of experience with N WEA testing prior to spring 2003 in order to secure participants for the study. 
Based on theresponse from this solicitation, fall 2002 and spring 2003 CST and NWEA student assessment 
recordsin reading, language usage and mathematics were collected from six school systems. These were the 
Capistrano Unified, Escondido Union, Gilroy Unified, Lake Elsinore Unified, and Visalia Unified school 
systems. H awthorne School District supplied CST and NWEA data for their spring 2003 testing season. 

Data Preparation 

For purposes of studying NWEA test alignment with the CST, 2 nd through 8 th grade student test records 
from fall 2002 and spring 2003 NWEA assessments were matched with the 2003 CST assessments using 
district assigned studentID numbers. BecauseNWEA offers assessments in both readingand language 
usage, the NWEA records were separately matched to the California CST English Language Assessment. 
Matched records were then screened to removeinvalid scores. Table! shows the number of matched 
student records included in the analysis. 



Table 1 

Reading and Mathematics Tests Included by G rade 





| G rade 2 


| G rade 3 


G rade 4 


G rade 5 


G rade 6 


| G rade 7 


| G rade 8 


Fall Reading 


4983 


8503 


8922 


8928 


9192 


9138 


8257 


Spring Reading 


10348 


10582 


10871 


10694 


10610 


10637 


9688 


Fall Language 


3278 


8486 


8839 


8902 


9099 


9242 


8349 


Spring Language 


9402 


9376 


9711 


9686 


9723 


9927 


8948 


Fall M athematics 


5096 


8644 


9023 


9042 


9157 


9086 


8087 


Spring 

M athematics 


10686 


10726 


11032 


10822 


10840 


10999 


9971 



This the largest pool of students that NWEA has included in a state alignment study to date. We had 
enough student records at each grade to adequately cover the breadth of the scale and perform a robust 
analysis near the proficiency point for each NWEA tested subject. The number of records avail able for fall 
NWEA testing in second grade was considerably smaller than spring, mainly because many school systems 
do not administer fall NWEA tests to second grade students. 

Because local curricula may vary in its alignment with either NWEA or state assessments, we recommend 
that schools validate our estimates by cross-checking their own students' performance against our projected 
cut scores. 



Analyses 

Pearson correlations. Theinitial analyses focused on the relationships among the NWEA and 
California assessment scores at each grade to determine how closely the scores on the NWEA test correlated 
with same subject scores on the CST. Simple bivariate correlation coefficients were computed among these 
scores. 



Linking CST scores to the RIT scales. Fall and spring scores on the RIT scale were linked 
separately to the appropriate scale on the CST. Three methods of estimating cut scores for CST levels were 
used. The most straightforward wassimplelinear regression (CST pred =a( RIT) +c). Since we sometimes 
observe departures from a linear relationship on the lower and upper ends of state test scales, a second 
order regression model wasalso used (CST pred =a(RIT 2 ) + b(RIT) +c). For each of these methods, the RIT 
score was determi ned by substituting the appropriate CST score for CST ^ and solving the equation for 
RIT. 

A fixed-parameter Rasch model was also used to estimate RIT cut scores. In this method, the CST 
performance level was treated asatest item. Theassumption isthat the performance level 'item' should 
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contain all the information about the difficulty of the test. Student abilities (RIT scores) werethe 'fixed 
parameter' used to anchor the difficulty estimate of the 'status' item to the RIT scale. The resulting 
'difficulty estimate' was taken as the RIT cut score for this method. This is referred to as the Rasch Status 
on Standard (or simply Rasch SOS) method. 

Predicting CST performance levels from RIT scores. Fall and Spring RIT scores were first used to predict 
whether students were likely to achieve performance at or above the proficient performance level on the 
CST. Wemakethe estimates from this level in order to maintain consistency with prior studies of state test 
alignment, which make comparisons based on the NC LB reported performance level. This allows us to 
make accurate comparisons of our alignment with different state tests. 

The predictions of CST performance were compared to observed performance in 2X 2 contingency tables. 

A prediction index score was generated to measure the ratio of Type I error to accurate prediction of 
proficiency status. This score is expressed as 

1-(N umber of Type I errors/N umber of correct predictions) 

H igher prediction index numbers generally show mo re accurate prediction with lower levels of Type I error. 
Type I error occurs when NWEA assessments predict that a student will achieve above a passing level of 
performance when the student actually achieves a failing score. This index was generated for the linear, 
second order, and Rasch SOS methodologies. In general, the highest prediction index score was used to 
select theRIT cut score to be adapted as the official RIT score we would associate with achieving the passing 
standard on the corresponding CST assessment for the particular grade level and subject area. We do make 
exceptions to this rule when the estimated score produces high accuracy rates but inordinately large 
numbers of Type II errors. This condition indicates a greatly overestimated cut score, so we select a method 
that produces a more balanced Type I to Type 1 1 error ratio in these instances. 

In addition, we evaluated the accuracy of predictions of CST levels based on observed RIT scores. The 
predictions of CST level performance were compared to observed performance in 5X 5 contingency tables. 
Once again a prediction index score was generated to provide an estimate of accuracy. 

Content Validity 

Formal comparisons of the content of NWEA and California tests were not conducted for purposes of this 
study. The standards used to construct the NWEA Assessments were the same as those used for the 
California assessments. Both NWEA assessments and the California assessments include multiple- choice 
items. The CST also includes short answer and extended response questions. Results from our previous 
studies indicate that the addition of itemsin alternate formats generally does not, by itself, materially affect 
the ability of the N W EA test to gen erate reasonably accurate predictions of performance levels. 

Results 



Descriptive Statistics 

Tables 2 through 4 review descriptivestatisticsfor theCST andNWEA assessments. ThemedianRIT scores 
for thissampleare generally near or slightly above theNWEA norm in language usage and mathematics. 
They are slightly below theNWEA norm in reading. Relative to theCST, average scores are generally near 
to or above the norm in both English/ Language Arts and mathematics. 

Alignment studies require data that adequately represents the range of the scales being measured. In this 
case, we concluded from the descriptive statistics that the sample reflected a reasonably representative 
population. I n addition, the population of students performing near the standards was large and should 
produce robust predictions of performance near the proficiency standard. We were concerned about the 
number of students who might perform at the far below basic level of performance, si nee there seemed to be 
relatively small numbers of these students in the sample population. No other state that we have studied 
assi gn s a si m i I ar desi gn ati on . 
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Table 2 

Means, Standard Deviations, and Medians for the CST and N W EA assessments - Reading 



ELA matched to fall Fall N W EA Reading ELA matched to spring Spring N W EA Reading 

Mean Median SD Mean | Median SD Mean | Median | SD Mean Median | SD 



G rade 2 




357 


52.87 




178 


16.27 




341 


54.28 




185 


16.55 j 


G rade 3 


335.51 


335 


60.54 


187.32 


189 


16.68 


KiiiTS 


331 


60.49 


195.58 


198 


16.47 


G rade 4 


345.81 


346 


50.14 




199 


16.57 


342.61 


340 


49.58 


202.80 


205 


16.27 


G rade 5 


336.84 


337 


47.10 


204.38 


206 


16.67 


334.22 


334 


46.23 


208.93 


211 


16.39 


G rade 6 




338 


51.77 


208.84 


211 


16.37 


335.80 


335 


51.61 


212.62 


215 


16.79 


G rade 7 


338.71 


339 


51.32 


214.06 


216 


15.92 


334.56 


333 


51.51 


216.67 


219 


16.83 


G rade 8 


331.95 


333 


49.46 


217.47 


219 


16.26 


327.19 


327 


49.79 


220.44 


223 


16.92 



Table 3 

Means, Standard Deviations, and Medians for the CST and N W EA assessments - Language Usage 



ELA matched to fall 



Fall N W EA Language 



ELA matched to spring 



Mean Median I SD Mean Median I SD Mean Median I SD 



G rade 2 


347.36 


349 


53.93 


GEBI 


IB 


mm | 


341.44 


341 


55.29 




nmn 


mmm 


G rade 3 


335.13 


335 


60.70 


BK 


mm 


Sjfpl 


332.99 


331 


61.25 






m H: 


G rade 4 


345.87 


346 


50.13 


laiii 


It® 


I 39 


344.57 


343 


50.37 


206.32 


209 


15.60 


G rade 5 


336.95 


337 


47.04 


206.98 


209 


15.41 


335.80 


334 


47.09 


211.91 


214 


15.02 


G rade 6 


339.36 


338 


51.82 


211.36 


214 


15.15 


337.89 


338 


51.95 


214.97 


217 


14.94 


G rade 7 


338.44 


339 


51.44 


215.49 


218 


14.31 


336.84 


336 


51.76 


218.51 


220 


14.44 


G rade 8 


331.65 


333 


49.44 


218.18 


220 


14.31 


330.30 


330 


49.69 


220.99 


223 


14.53 



Table4 

M eans, Standard Deviations, and M ediansfor the CST and N WEA assessments - M athematics 



CST M ath matched to fall Fall N WEA Math J ' n Spring N W EA M ath 





M ea n 


M edian 


SD 


| M e a n 


1 M edian 


SD 


M ea n 


Median 


SD 


M ean 


M edian 


j SD 


G rade 2 


338.45 


386 


75.17 






10.38 


339.61 


341 




mm 


189 


■Hfll 


G rade 3 


335.74 


352 


73.23 




190 


13.21 




331 




n 


202 




G rade 4 


349.21 


348 


66.43 




203 


13.51 


342.52 






209.33 


210 


15.12 


G rade 5 


335.81 


324 


74.57 


209.53 


210 


15.09 


334.30 


334 


46.44 


217.63 


218 


16.77 


G rade 6 


337.26 


329 


62.51 


215.96 


217 


16.78 


335.61 


335 


51.73 


222.05 


223 


18.73 


G rade 7 


330.73 


323 


57.48 


223.25 


224 


17.84 


334.30 


333 


51.59 


227.89 


229 


20.02 


G rade 8 


329.80 


326 


60.58 


228.79 


230 


18.82 


326.90 


327 


49.69 


232.82 


234 


21.01 



Pearson correlations 

Table 5 shows the results of this analysis for each grade. Concurrent validity was tested by exami ning same 
subject Pearson correlations between the N WEA and the CST. Same subject correlations were very high. 

In reading and language arts, all coefficients between the CST and NWEA tests were above .81, with the 
single exception ofthefall grade2 reading and language tests (r=. 76 for reading and r=.77 for language). In 
mathematics cor relation coefficient generally ranged between ,74and .85. Once again thefall grade2 
coefficient for was substantially lower than those for the other tests (r=. 67). In the upper grades, reading 
assessments correlated slightly more closely with the ELA portion of the CST, while language usage 
correlated slightly more closely at the lower grades. 

The results suggest that the NWEA tests were generally measuring the same constructs as the CST. We 
expected spring NWEA tests to correlate more closely with theCST than the tests administered in the prior 
fall. This was the case in all grades except grade 8. The lower grade 2 correlations were not surprising. 

M any 2 nd graders in the NWEA test population are taking multi pie-choice tests for the first time in fall of 
second grade and standardized tests on the whole do not show the same consistency with second graders as 
they do in other grades. 
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Discriminant validity was tested by examining same subject Pearson correlations next to correlations for the 
alternate subject on the state assessment. In particular, we tested the N WEA and CST math tests against the 
California ELA Standards Test. Wetested theNWEA readingand language usage tests and theCalfornia 
ELA tests against the CST Math. In all instances the same subject correlations were higher than correlations 
against the alternate subject, leading us to conclude that these assessments were more likely to be testing 
similar constructs than dissimilar. 

Table 5 

Pearson Correlations for CST and NW EA assessments by Subject 



G rade 2 





; Assessment 




CST ELA 




Reading 


i N W EA Language j 


CST 


; N WEA Math 


Assessment 




Fall 


Spring 


Fall 


Spring 


Math 


Fall 


Spring 


CST ELA 


1.000 


.761 


.810 


.770 


.827 


.760 


.688 


.750 


CST Math 


.760 


.616 


.669 


.616 


.698 


1.000 


.670 


.752 



G rade 3 



Assessment 





CST ELA 


N W EA Reading 


N W EA Language 


CST 


N W EA Math 


Assessment 




Fall 


Spring 


Fall 


Spring 


Math 


Fall 


Spring 


CST ELA 


1.000 


.812 


.837 


.821 


.845 


.798 


.745 


.778 


CST Math 


.728 


.682 


.728 


.705 


.751 


1.00 


.756 


.818 



G rade 4 



Assessment 





CST ELA 


N W EA Reading 


N W EA Language 


CST 


N W EA Math 


Assessment 




Fall 


Spring 


Fall 


Spring 


Math 


Fall 


Spring 


CST ELA 


1.000 


.828 


.833 


.822 


.811 


.782 


.759 


.788 


CST Math 


.782 


.700 


.715 


.715 


.710 


1.000 


.788 


.833 



G rade 5 



Assessment 





CST ELA 


N W EA 


Reading 


N W EA Language 


CST 


N W EA Math 


Assessment 




Fall 


Spring 


Fall 


Spring 


Math 


Fall 


Spring 


CST ELA 


1.000 


.826 


.817 


.811 


.812 


.762 


.767 


.775 


CST Math 


.762 


.700 


.701 


.710 


.718 


1.00 


.811 


.845 



G rade 6 



Assessment 





CST ELA 


N W EA 


Reading 


; N W EA Language 


CST 


i N WEA Math 


Assessment 




Fall 


Spring 


Fall 


Spring 


Math 


Fall 


Spring 


CST ELA 


1.000 


.841 


.834 


.818 


.814 


.798 


.784 


.792 


CST Math 


.798 


.730 


.729 


.724 


.725 


1.000 


.839 


.855 



G rade 7 





I Assessment 




CST ELA 




Reading 


; N W EA Language ! 


CST 


j N WEA Math 


Assessment 




Fall 


Spring 


Fall 


Spring 


Math 


Fall 


Spring 


CST ELA 


1.000 


.832 


.831 


.807 


.807 


.781 


.787 


.784 


CST Math 


.781 


.708 


.706 


.708 


.710 


1.000 


.851 


.851 



G rade 8 



Assessment 





CST ELA 


N W EA 


Reading 


N W EA Language 


CST 


N W EA Math 


Assessment 




Fall 


Spring 


Fall 


Spring 


Math 


Fall 


Spring 


CST ELA 


1.000 


.815 


tillllM 


.792 


.783 


.707 


.767 


.746 


CST Math 


.707 


.658 


.666 


.672 


.657 


1.000 


.784 


.772 
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Analysis of scatterplots suggested that relationships between most NWEA tests and their CST 
counterpart were strongly curvilinear with a pronounced floor effect at some grades. Figure 1 provides 
an example from the 8 th grade reading sample that illustrates both the scale relationships and the 
evidence of some breakdown in correlation near the bottom oftheCST Scale. Note how the 
correlation between the two tests flattens for students performing below 300 on the CST. Note also 
that large numbers of students achieving below 300 on the CST test achieve a wi derange of scores 
(between 160 and 220 R IT) on the corresponding NWEA exam. One possible explanation for this is 
that the NWEA test, because it is adaptive as opposed to single form, has the capacity to adjust the 
difficulty to the test to enable more accurate measurement at the low end of performance. 

Figure 1 - Scatterplot depicting G rade 8 N W EA math RIT against the G rade 8 CST math scale 
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Linking CST performance level cut scores to the RIT scale 

The primary purposeof this study was to estimate the fall and spring RIT scale scores that most closely 
correspond to the cut scores for the different performance levels on the CST. This information allows 
schools to identify students who may need additional support to reach state standards. It can also help 
schools identify students who are performing well enough that they are ready to tackle work beyond what 
the state standards requi re. 

Tables 6 and 7 shows several estimations of the Fall and Spring RIT scores that correspond to the cut scores 
for the various performance levels on the CST scales. Asa rule the three methodologies came to very 
similar estimates of the cut score for each of the performance levels. Estimates of the two lowest (far below 
basic and below basic) and highest (advanced) cut score varied more, in part because far fewer students 
perform at these levels and in part becauseofthenon-linear natureof the relationship. In somegrades, 
calibration of the below and far below basic estimates was inconsistent. For example, second order 
regression estimated afar below basic/below basic cut score for fall of grade 4 in language usage and grade 6 
in mathematics (see table 7) that was lower than the respective prior year's estimates. In somecasesthis 
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may have occurred because the estimated fall cut scores the lowest level oftheCST were close to thelowest 
valid scores on theNWEA scale. 



Table 6 

Estimated points on the RIT scale for SPRING that equate to the minimum scores (rounded) for 

performance levels on the CST 



Reading 



Linear Regression 



Second-order Regression Rasch Status-on-Standard 



Below Basic Prof Adv I Below Basic Prof I Adv Below I Basic Prof I Ad v 



G rade 2 
G rade 3 




m 




m 






G rade 4 


174 


186 


206 


m 


166 


188 


208 


220 


174 


191 


208 


218 


G rade 5 


183 


194 


216 


235 


179 


197 


217 


229 


185 


200 


215 


228 


G rade 6 


188 


199 


218 


235 


188 


203 


220 


232 


190 


204 


219 


230 


G rade 7 


190 


204 


223 


242 


188 


207 


225 


238 


193 


208 


223 


235 


G rade 8 


196 


209 


230 


248 


194 


212 


230 


242 


201 


214 


229 


240 



Linear Regression 
Language Below Basic Prof Adv 
Usage 



Second-order Regression Rasch Status-on-Standard 



G 


rade 


2 


G 


rade 


3 


G 


rade 


4 


G 


rade 


5 


G 


rade 


6 


G 


rade 


7 


G 


rade 


8 




M athematics 




Linear Regression | Second-order Regression Rasch Status-on-Standard 
Below Basic Prof Adv I Below Basic Prof I Adv Below I Basic Prof I Adv 



G rade 2 
G rade 3 




mmmsm aen 

K 




G rade 4 


182 


197 


212 


225 


180 


198 


212 


225 


184 


201 


211 


223 


G rade 5 


197 


209 


224 


245 


194 


211 


224 


241 


198 


213 


224 


239 


G rade 6 


194 


211 


231 


252 


189 


214 


231 


248 


192 


215 


229 


245 


G rade 7 


197 


217 


239 


265 


188 


219 


239 


259 


200 


221 


238 


257 


G rade 8 


202 


223 


246 


273 


197 


225 


246 


267 


208 


227 


244 


264 
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Table 7 

Estimated points on the RIT scale for the FALL PRIO R to CST testing that equate to the minimum scores 

(rounded) for performance levels on the CST 



G rade 2 






mm 


mm 


rnn 


MSM 


msm 


H 


mm 


mm 


mm 


f 


G rade 3 






19 


19 




msm 


EH 




•mm 


■ I 


EH 




G rade 4 


166 


179 


199 


216 


155 


181 


201 


214 


163 


184 


201 


211 


G rade 5 


177 


189 


210 


229 


172 


191 


211 


224 


179 


194 


210 


223 


G rade 6 


183 


194 


213 


229 


181 


197 


215 


227 


185 


199 


214 


225 


G rade 7 


187 


200 


218 


237 


184 


203 


220 


234 


190 


204 


218 


231 


G rade 8 


192 


205 


225 


242 


189 


207 


225 


237 


196 


210 


224 


236 




Linear Regression 


Second-order Regression 


Rasch Status-on-Standard 


Language 


Below 


Basic 


Prof 


Adv 


Below 


Basic 


Prof 


Adv 


Below 


Basic 


Prof 


Adv 


Usage 


























G rade 2 


MM 


■HilH 


mm 


mm 


MEM 


161 


179 


197 


MEM 


HU 


mm 


mm 


G rade 3 


mm 


vrSa 


mm 


m 


mm 


182 


197 


211 


EE 


■tn 


EH 


Elfl 


G rade 4 


m 


183 


202 


218 


In 


186 




217 


169 


189 


204 


215 


G rade 5 


181 


192 


212 


230 


178 


195 


214 


225 


184 


198 


212 


224 


G rade 6 


186 


197 


215 


230 


187 


201 


217 


228 


190 


203 


217 


226 


G rade 7 


190 


202 


219 


236 


187 


205 


221 


233 


195 


207 


220 


230 


G rade 8 


195 


206 


224 


240 


195 


210 


226 


237 


200 


212 


224 


234 




Linear Regression 


Second-order Regression 


Rasch Status-on-Standard 


M athematics 


Below 


Basic 


Prof 


Adv 


Below 


Basic 


Prof 


| Adv 


Below 


Basic 


Prof 


Adv 


G rade 2 


KH 




msm 


mm 


138 


158 


MtbW 


183 


MSM 


■H 


mm 


■ran 


G rade 3 


Ml 




ill 


m 


151 


177 


lIUlM 


203 


W&M 




eh 




G rade 4 


174 


188 


201 


215 


168 


189 




213 


176 


193 


203 


212 


G rade 5 


189 


201 


213 


233 


188 


203 


215 


230 


191 


206 


215 


227 


G rade 6 


189 


204 


220 


241 


183 


207 


mm 


238 


188 


208 


221 


235 


G rade 7 


197 


212 


231 


254 


193 


215 


Eli 


250 


197 


215 


231 


248 


G rade 8 


200 


217 


237 


262 


196 


218 


EH 


257 


204 


221 


236 


256 



Predicting CST pass-fail status from RIT scores 

Once the spring and fall cut scores were estimated from the three methods, we evaluated each possiblecut 
score to determine how accurately it predicted students’ actual performance on the corresponding CST 
assessment. The most accurate method of prediction was generally used to derive the best estimate of RIT 
cut scores that equate to the different CST performance levels. Once again a prediction index statistic 
(described on page 3) scored the accuracy of prediction. 

For this study, we first assessed the accuracy of the RIT scale in correctly predicting whether students are 
likely to reach the proficient level on the corresponding CST test. Next we assessed the accuracy with which 
the RIT predicted proper performance level assignment on this test. Use of the prediction index statistic 
helped assure that the method chosen produced a high ratio of accurate passing pred ictions relative to Type 
I errors. Typel errors occur when the RIT scalepredictsapassingscoreforastudentwho actually fails the 
assessment. These types of errors raise particular concern because they fail to identify students who might 
need additional support and resources in order to achieve their targets. A high prediction index number 
i n d i cates th at th e test maximizes accuracy ofpredictionwhileminimizingTypelerrors. 

In these kinds of studies we want to emphasize that prediction isnotusedto foretell an inevitablefuturefor 
the student, rather it is used to help schools plan for instruction and offer appropriate interventions to 
children who need additional support to be successful. For purposes of the No Child Left Behind Act , 
schools are judged on their ability to move children to the proficient level and beyond. RIT scores can 
provide teachers with advance notice about students who may not reach these goals on the California 
assessment that corresponds to their grade level. 
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Tables 8, 9, and 10 summarize the results. When using spring RIT scores, ail methods accurately predicted 
proficiency status with average rate of 84% or better in English/Language Arts and 83% for mathematics. 
When usingfall RIT scores the accuracy ratedropped only slightly, with all methods accurately predicting 
pass/fail status with an accuracy rate greater than 83% for English/Language Arts and 82% for mathematics. 
Second-order regression methods were consistently more accurate at predicting proficiency status than the 
other methods. 

Table 8 

Accuracy of reading RIT scores in predicting CST proficiency status - ELA 



G rade 2 



Linear 

Second 0 rder 
Rasch 



G rade 3 



Linear 

Second 0 rder 
Rasch 



G rade 4 



Linear 

Second 0 rder 
Rasch 



G rade 5 



Cut 

Score 




Spring 

Accuracy I Type I 
Error 



Prediction 

Index 




Accuracy 






92 


82.93 


94 


83.65 


93 


83.39 



Type 1 


Prediction 


Cut 


Accuracy 


Type 1 


Prediction 


Error 


Index 


Score 




Error 


Index 



10.61% 




2 


85.16% 


8.65% 


.898 


3 


85.17% 


6.04% 


.929 


2 


85.16% 


8.65% 


.898 



Cut 


Accuracy Type 1 


Prediction 


Cut 


Accuracy 




Score 


Error 


Index 


Score 







Type I 
Error 



Prediction 

Index 



■ 



L99 


83.; 


101 


84.; 


101 


84.; 




Cut 

Score 



Accuracy 



Linear 


210 


Second 0 rder 


211 


Rasch 


210 



Type 1 


Prediction 


Cut 


Accuracy 


Type 1 


Prediction 


Error 


Index 


Score 




Error 


Index 




G rade 6 



Linear 

Second 0 rder 
Rasch 



G rade 7 



Cut 

Score 



Accuracy 



3 


7.74 


3 


6.33 


3 


9.36 







3 


85.65 


5 


85.37 


4 


85.62 



Type 1 


Prediction 


Cut 


Accuracy 


Type 1 


Prediction 


Error 


Index 


Score 




Error 


Index 




Cut 

Score 



Accuracy 



Linear 

Second 0 rder 


m 


84.80 

85.53 


Rasch 


218 


84.80 




/„ 


.914 


1 


Prediction 


r 


Index 


h 


.892 


4 


.933 


h 


.892 



MX 


86.03 




86.23 


219 


86.51 




ra H p P 


Cut 


Accuracy Type 1 


Prediction 


Cut 


Accuracy 


Type 1 


Prediction 


u i a u c o 


Score 


Error 


Index 


Score 




Error 


Index 



Linear 

Second 0 rder 
Rasch 




30 


85.4 


30 


85.4 


29 


85.5 
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Table 9 

Accuracy of language usage RIT scores in predicting CST proficiency status - ELA 



Linear 
Second 0 rder 
Rasch 



G rade 5 



Linear 

Second 0 rder 
Rasch 



G rade 6 



Linear 

Second 0 rder 
Rasch 



G rade 7 



Linear 

Second 0 rder 
Rasch 



Linear 

Second O rder 
Rasch 


pa 


81.15% 

81.42% 

81.15% 




.874 

.891 

.874 


192 

193 
193 








G rade 3 


Cut 

Score 


Accuracy 




Fype 

Error 


1 


Prediction 

Index 


Cut 

Score 


Accuracy 


Type 1 
Error 


Prediction 

Index 


Linear 


195 


83.00% 




.871 


H 


85.24% 


8.53% 


.900 


Second O rder 


197 


83.60% 


7.97% 




— 


85.31% 


7.13% 


.916 


Rasch 


196 


83.40% 




.888 


205 


85.24% 


8.53% 


.900 


G rade 4 


Cut 

Score 


Accuracy 




rype 

Error 


1 


Prediction 

Index 


Cut 

Score 


Accuracy 


Type 1 
Error 


Prediction 

Index 



G rade 2 



Fall 

Accuracy Type I 
Error 




202 

205 

204 



82.59% 

83.43% 

83.13% 



12.37% 

7.53% 

9.43% 



.850 

.910 

.887 



208 

210 

210 



83.42% 

84.37% 

84.37% 



12.52% 

9.29% 

9.29% 



Cut Accuracy 
Score 



212 

214 

212 



83.96% 

83.86% 

83.96% 



9.28% 

6.83% 

9.28% 



.889 

.919 

.889 



218 

218 

218 



83.70% 

83.70% 

83.70% 



8.72% 

8.72% 

8.72% 



215 

217 

217 



219 
221 

220 



83.62% 

84.24% 

84.24% 



Accuracy 



83.41% 

83.48% 

83.51% 



11.06% 
7.9 
7.9 



.868 

.906 

.906 



219 
221 

220 



83.03% 

84.13% 

83.57% 



11.19% 

7.42% 

9.35% 



10.35% 

7.10% 

8.62% 



.876 

.915 

.897 



223 

225 

223 



83.15% 

82.87% 

83.15% 



9.76% 

6.69% 

9.76% 



.850 

.890 

.890 



1 Prediction 


Cut 


Accuracy Type 1 


Prediction 


Index 


Score 


Error 


Index 



.896 

.896 

.896 



Cut 


Accuracy Typ 


el Prediction 


Cut 


Accuracy Type 1 


Prediction 


Score 


Err 


or Index 


Score 


Error 


Index 



.865 

.912 



1 Prediction 


Cut 


Accuracy Type 1 


Prediction 


Index 


Score 


Error 


Index 



.883 

.919 

.883 



G rade 8 


Cut 

Score 


Accuracy 


Type 1 
Error 


Prediction 

Index 


C ut 
Score 


Accuracy 


Type 1 
Error 


Prediction 

Index 


Linear 


224 


83.47% 


9.21% 




228 


83.50% 


7.77% 


.907 


Second O rder 


226 


83.71% 


5.77% 


.931 


230 


83.12% 




.942 


Rasch 


224 


83.47% 


9.21% 




227 


83.45% 


9.37% 


.888 
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Table 10 

Accuracy of mathematics RIT scores in predicting CST proficiency status - mathematics 





Fall 


Spring 


G rade 2 


Cut 

Score 


Accuracy 


Type 1 
Error 


Prediction 

Index 


Cut 

Score 


Accuracy 


Ty 

El 


Linear 


imM 


78.69% 


14.38% 


7iT7 






IT 


Second 0 rder 




78.69% 


14.38% 


.872 


185 


80.49% 




Rasch 


172 


78.57% 


11.48% 


.854 


185 


80.49% 






Cut 


Accuracy 


Type 1 


Prediction 


Cut 


Accuracy 


Ty 




Score 




Error 


Index 


Score 




E 


Linear 


188 


78.24% 


1157% 


Ml 






■!!] 


Second 0 rder 




78.81% 




.872 


202 


82.61% 


8. 


Rasch 


189 


78.75% 


11.79% 




202 


82.61% 


8. 




Cut 


Accuracy 


Type 1 


Prediction 


Cut 


Accuracy 


Ty 




Score 




Error 


Index 


Score 




El 


Linear 


HOH 


Blil 


■BNGHM 






8151% 


87 


Second 0 rder 


202 


80.56% 


11.70% 




212 


83.63% 


7. 


Rasch 


203 


80.51% 


10.35% 


■pH 


211 


83.61% 


8. 




Cut 


Accuracy 


Type 1 


Prediction 


Cut 


Accuracy 


Ty 




Score 




Error 


Index 


Score 




E 


■IfiTXnH 


wn 


— »!»■ 
« ]» 










■:i 










mm . 


224 






yum 








Wmm 


224 


|K 








Accuracy 


Type 1 


Prediction 


Cut 


Accuracy 


Ty 




Score 




Error 


Index 


Score 




E 


Linear 


m 


85.19% 


972 % 








m i 


Second 0 rder 


223 


85.56% 


4.77% 




231 




5. 


Rasch 


221 


85.72% 


7.69% 




229 




m 


G rade 7 


Cut 


Accuracy 


Type 1 


Prediction 


Cut 


Accuracy 


Ty 


Score 




Error 


Index 


Score 




E 


Linear 


211 


86.78% 


733% 


m 


BE 


87776% 


77 


Second 0 rder 


233 


86.84% 


4.99% 


.943 


239 


88.07% 


6. 


Rasch 


231 


86.78% 


7.43% 


.914 


238 


87.76% 


7. 


G ra d p 8 


Cut 


Accuracy 


Type 1 


Prediction 


Cut 


Accuracy 


Ty 


VJ 1 Out u 


Score 




Error 


Index 


Score 




El 


Linear 


237 


79.93% 


|mH| 






8189% 


97 


Second 0 rder 


237 


79.93% 






246 


81.97% 


8. 


Rasch 


236 


SB 




.866 


244 


81.85% 






.892 

.872 



Table 11 summarizes the accuracy of proficiency prediction for this study relative to other state alignment 
studies. Prediction index scoresfor California are near average in readingand slightly above average for the 
I an gu age u sage test ( relative to predicting results in English/Language Arts). Prediction index scoresfor 
mathematics were lower than the average for prior state alignment studies that we have conducted. The 
table suggests that little accuracy was lost when we used thefall assessmentto predict state assessment 
proficiency status. Prediction index averages for thefall assessment were only slightly lower than spring. 

One factor affecting accuracy of proficiency status prediction in California was the state’s testing of second 
grade students. California is the only state we have studied to date that administers their state assessment in 
second grade. We exported that the accuracy of prediction for second graders would be somewhat lower 
than third graders and the results reflected our expectations. 

Despite this fact, the rates of correct prediction are easily high enough to provide useful information to 
educators who are planning instruction to ensure all students perform at a level that meets the standards. 
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Table 11 

Prediction Indices (Based on Proficiency Status) 
for Previous NW EA State Alignment Studies 



State 


Reading 


|Sta te 


1 Lanaguage 


State 


| Math 


Texas 


.974 


Texas 


.968 


Texas 




W a shing to n 


.971 


California (spr) 


.913 


W yoming 


.961 


M innesota 


.944 


C alifornia (fall) 


.913 


Colorado ‘01 


.957 


Pennsylvania 


.935 


Indiana ‘01 


.907 


W ashington 


.949 ! 


W yoming 


.931 


Colorado ‘03 


.903 


Illinois 


.946 


Colorado ‘03 


.931 


Indiana ‘03 


.894 


Colorado ‘03 


.943 


Illinois 


.928 


Arizona 


.874 


South Carolina 


.943 


California (spr) 


.925 






M innesota 


.936 


C alifornia (fall) 


.914 






W ashington 


.936 


Arizona 


.912 






Pennsylvania 


.926 


Colorado ‘01 


.910 






Arizona 


.919 


N evada 


.902 






California (spr) 


.910 


South C arolina 


.902 






Indiana ‘01 


.899 


Indiana ‘01 


.902 






C alifornia (fall) 


.895 


Indiana ‘03 


.900 






N evada 


.866 


W ashington 


.886 




Indiana ‘03 


.860 



* Texas results were generated by a study of over 1,000 per grade from a single school district. 



Predicting CST Performance Levels from RIT Scores 



TheCST reports five levels of performance. Four cut scores are set to define these five levels. 
Analyzing the capacity of RIT scores to predict students' CST performance levels can help educators 
triangulate information about student performance on their state test, assuring that instructional plans and 
interventions are adequately reinforced by data. Predictions of performance level are not as accurate as the 
predictions of proficiency status. This is true in part because tests vary in their ability to measure students 
at the highest and lowest performance levels. In the case of the California state assessment, predictions of 
performance level were influenced by the high number of performance levels used for the test (California 
and Minnesota are the only states we have studied that use five) and the small number of students scoring 
in the lowest category (far below basic) on the state assessment. 

When predicting performance levels, a case is identified as accurate when the performance level assigned by 
theCST and RIT score are the same. ATypel error occurs when the RIT score assigns a performance level 
that is higher than the student actually achieved on the state test. For example, if the RIT score projects an 
advanced performance for the student and theCST result is proficient, wedeclarethecaseaType I error 
because the RIT score overestimated performance. 

In addition to assessing the rate of correct prediction, we also assessed accuracy by evaluating the success 
with which the projected RIT cut scores for the highest and lowest performance levels identified students in 
these two categories. For example, if 1000 grade 3 students performed at the advanced level in a subject and 
a RIT score identified 600 students as advanced, then we would say the RIT score was successful at finding 
60% of the advanced students. For the highest and lowest performance level, we used this methodology to 
assign the cut score that would best predict the far below basic and advanced performance levels. 

Tables 12, 13 and 14 summarize theseresults. 
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Table 12 

Accuracy of the RIT scale In predicting CST performance level - reading 



G rade 2 



Linear 
2 nd Order 
Ra sch 



G rade 3 



Linear 
2 nd Order 
Ra sch 



G rade 4 



Linear 
2 nd Order 
Ra sch 



G rade 5 



Linear 
2 nd Order 
Ra sch 



G rade 6 



Linear 
2 nd Order 
Ra sch 



G rade 7 



Linear 
2 nd Order 
Ra sch 



G rade 8 



Linear 
2 nd Order 
Ra sch 







Fall 










Spring 






Accuracy 


Type 1 
Error 


Prediction 

Index 


% Adv. 
Found 


% B.B. 
Found 


Accuracy 


Type 1 
Error 


Prediction 

Index 


% Adv. 
Found 


% BB 
Found 


39.9% 


iwr 




28 .y% 


■nwKgg 


53.9% 


733%^ 


75P 








15.3% 


.621 




BfpfMfl 


54.3% 


22.9% 


.579 


52.7% 


49.9% 




26.6% 


.476 


65.7% 


27.7% 


54.3% 


25.2% 


.536 


65.0% 


55.0% 



Accuracy 



46.7% 

48.9% 

58.8% 



T574TT 

44.4% 

58.1% 



rediction 

Index 




l B,B ,' Accuracy T c ype 
Found ’ Error 



20.9% 

23.4% 





m 

m 








Accuracy 


Type 1 
Error 


l>reaiction 

Index 


T^av^ 

Found 


% B.B 
Found 



57.8% 

60.2% 

59.3% 



22.7% 

21 . 2 % 

21.3% 



Accuracy 



H 


IB 








Accuracy 


Error 


l>reaiction 

Index 


%Aa^ 

Found 


^%bT 

Found 



59.1% 

58.2% 

59.3% 



22.3% 

17.1% 

21.3% 



Accuracy ^ 



EljXjK^pl 


wgwm 




EMjtH 


iwg 








luiB 


HSVIfl 




BTilitM 




mm 


mnW*M 


Accuracy 


Type 1 
Error 


PreaictKHi 

Index 


% Adv. 
Found 


% B.B 
Found 



60.5% 

60.7% 



17.2% 

20.9% 



Accuracy ^ 



37.3% 

34.0% 

23.2% 






HBHVfl 




HjMM 








mujwm 




























Bt j^Hpl 







Accuracy 1 ^ 1 yr * a \ a ' on * B - B ; Accuracy 1 

’ Error Index Found Found ’ Error 



3177% 

34.0% 

22 . 8 % 





■n 




mm 


24.3% 




MIW 


4ZI% 


.235 


46.2% 


27.9% 


58. .2% 


24.5% 


.579 


56.4% 




.608 


68.7% 


51.2% 


57.7% 


22.8% 




63.9% 
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Linear 
2 nd Order 
Ra sch 



G rade 3 



Linear 
2 nd Order 
Ra sch 



G rade 4 



Linear 
2 nd Order 
Ra sch 



G rade 5 



Linear 
2 nd Order 
Ra sch 



G rade 6 



Linear 
2 nd Order 
Ra sch 



G rade 7 



Linear 
2 nd Order 
Ra sch 



G rade 8 



Linear 
2 nd Order 
Ra sch 



Table 13 

Accuracy of the RIT scale in predicting CST performance level - language usage 



Fall 


Spring 


Accuracy 


Type 1 
Error 


Prediction 

Index 


% Adv. 
Found 


% B.B. 
Found 


Accuracy 


Type 1 
Error 


Prediction 

Index 


% Adv. 
Found 


% BB 
Found 












jr?s 




HH!E!^EM5£i§ 








IiiVSkH 







































Accuracy 



45.8% 

47.1% 

58.2% 



46.9% 

49.1% 

56.8% 



rediction % Adv. 
Index Found 




T B, ' l ' Accuracy T c ype 
Found ’ Error 



TUT 

23.1% 

25.3% 



kfiXiKM 










L' 














Type 1 


l>reaiction 


T^av^ 


% B.B 


Error 


Index 


Found 


Found 



58.0% 

60.6% 

59.6% 



Tl2% 

44.4% 

57.1% 


37.9% 

36.8% 

22.6% 


WlMSgBg 


TOT 

47.9% 

59.6% 


33.6% 

29.2% 

47.7% 


Accuracy 


Type 1 
Error 


l>reaiction 

Index 


% Adv. 
Found 


% B.B. 
Found 



56.0% 

57.9% 

56.8% 









EMjfiMj 


















,X? 






Accuracy 


Type 1 
Error 


PreaictKHi 

Index 


% Adv. 
Found 


% B.B 
Found 



5 577% 
57.8% 
57.5% 



24.6% 

21 . 2 % 

22 . 2 % 



Accuracy ^ 



22.9% 

22 . 0 % 

22.5% 



Accuracy ^ 



20 . 8 % 

23.2% 



Accuracy ^ 




38.5% 

35.1% 

22.7% 



379 


3I59OT 


25.5% 


MiMW 








■my 


.286 


49.9% 




57.3% 


|fe? jt B 




'Ell 


32.2% 




66.8% 


49.2% 


56.8% 


BJjrfl 




B£fl 


50.7% 



Accuracy 1 ^ 1 prf ; fl ! ctlon 'l ^ Accuracy 1 

’ Error Index Found Found ’ Error 




SI 



45.7% 

54.7% 









HEH 




■Bfcri.'iTB 




wnea 


KfXVIV 










I f 










rlllW 






Wmm 


Hi 








ICT 



Northwest Evaluation Association, 4/ 19/ 2004 



16 

















Table 14 

Accuracy of the RIT scale In predicting CST performance level - mathematics 



G rade 2 



Linear 
2 nd Order 
Ra sch 



G rade 3 



Linear 
2 nd Order 
Ra sch 



G rade 4 



Linear 
2 nd Order 
Ra sch 



G rade 5 



Linear 
2 nd Order 
Rasch 



G rade 6 



Linear 
2 nd Order 
Ra sch 



G rade 7 



Linear 
2 nd Order 
Ra sch 



G rade 8 



Linear 
2 nd Order 
Ra sch 



Fall 




Accuracy 


Type 1 
Error 


Prediction 

Index 


% Adv. 
Found 


% B.B. 
Found 


Accuracy 


Type 1 
Error 




lljj 


7134 

.442 

-.260 


3b. 8% 
52.9% 
54.1% 


(H 




2618% 

24.4% 

25.7% 


Accuracy 


Error 


Prediction 

Index 


% Adv. 
Found 


Found 


Accuracy 


Type 1 
Error 


T57% 

49.6% 

42.9% 


29.3% 

27.3% 

42.7% 


WIHM 


5575% 

58.2% 

52.8% 


30% 

26.6% 

37.9% 


357% 

56.1% 

56.4% 


24.3% 

21.6% 

22.2% 


Accuracy 


Type 1 
Error 


Prediction 

Index 


% Adv. 
Found 


% B.B, 
Found 


Accuracy 


Type 1 
Error 






3R7 

.450 

.454 


397% 

62.5% 

65.4% 


363% 

19.8% 

39.4% 


363% 

56.6% 

56.2% 


25.0% 

22.6% 

22.7% 


Accuracy 


E^rror 


Prediction 

Index 


% Adv. 
Found 


% B.B. 
Found 


Accuracy 


Type 1 
Error 


5T77% 

53.9% 

46.9% 


24.3% 

19.6% 

22.1% 


3510 

.637 

.529 


367% 

52.9% 

54.1% 


367% 

42.5% 

54.7% 


5 677% 
58.7% 
58.5% 


20.4% 

16.6% 

19.8% 


Accuracy 


Type 1 
Error 


Prediction 

Index 


% Adv. 
Found 


% B.B. 
Found 


Accuracy 


Type 1 
Error 


3777% 

59.6% 

50.4% 


24.0% 

19.3% 

21.8% 


3554 

.676 

.567 


353% 

58.2% 

52.8% 


3T7 7% 
26.6% 
37.9% 


317% 

64.0% 

63.3% 


20.5% 

16.7% 

19.9% 


Accuracy 


Type 1 
Error 


Prediction 

Index 


% Adv. 
Found 


% B.B. 
Found 


Accuracy 


Type 1 
Error 


3U7% 

62.1% 

49.6% 


22.1% 

19.2% 

17.8% 




333% 

64.0% 

43.5% 


314% 

31.1% 

43.4% 


314% 

62.8% 

62.0% 


22.7% 

22.1% 

20.6% 


Accuracy 


Type 1 
Error 


PreaictKHi 

Index 


% Adv. 
Found 


% B.B. 
Found 


Accuracy 


Type 1 
Error 


5377% 

53.4% 

49.9% 


24.9% 

27.0% 

21.9% 


3536 

.495 

.562 


343 % 

54.5% 

42.7% 


3U3% 

31.1% 

49.8% 


■ 


a 



Spring 



Prediction 


% Adv. 


% BB 


Index 


Found 


Found 


7178 


58./% 


IH1 


.521 


53.6% 


14.9% 






40.4% 




.615 78.9% 60.4% 

.607 69.5% 55.6% 




.601 76.5 % 60.5% 

.596 69.2% 52.3% 




.739 80.6% 47.2% 

.685 69.7% 43.5% 




.479 44.5% 25.9% 

.504 60.1% 52.9% 
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Table 15 

Prediction index scores by performance level assignment 
for previous NW EA state alignment Studies 





ma 






W a shing to n 


.874 


W ashington 


928 


Texas 


.868 


Texas 


900 


Indiana 


.860 


Illinois 


888 


Colorado 


.840 


Colorado 


808 


Illinois 


.804 


W ashington 


805 


N evada 


.776 


Indiana 


804 


Pennsylvania 


.770 


Pennsylvania 


769 


South Carolina 


.757 


South C arolina 


764 


Arizona 


.756 


Arizona 


756 


W ashington 


.698 


N evada 


742 


M innesota 


.627 


M innesota 


611 


C alifo rnia 


.600 


C alifornia 


565 



Best estimates of C ST performance level cut scores 

To determinetheRIT scores that best predict the cut scores for the various California performance levels we 
did the following: 

• For the proficient and basic RIT cut score, we selected the methodology that produced the highest 
overall performance index score. 

• For thefar below basic RIT score and the advanced RIT score, we selected the cut scores that 
correctly predicted the largest proportion of students who actually achieved these levels of 
performanceon theCST. 

The methodology that was ultimately applied to determine cut scores is bolded in Tables 12 through 14. 
Tables 16 and 17 (see foil owing page) summarize the recommended cut scores for each performance level 
on theCST. 

Ana lysis of the performa nee level cut scores 

We hope that the projected cut scores provide useful information to educators who use N WEA data to help 
students succeed in learning and on their state test. In addition to information that can be used to plan 
student programs, the study also provides a helpful external look at some important aspects of the 
California Standards Test. Some of these include the difficulty of the standards relative to other states, 
dither difficulty of the state’s mathematics standards relative to theELA standards, and the calibration of 
the state's standards between grades. 
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Table 16 

Projected M inimum RIT Scores for FALL PRIO R that are Equivalent to Performance Levels on CST 
(scores under the below basic cut score project to far below basic 
NW EA percentile rank is in parenthses) 







Reading to C ST E LA 


L 




Languag 


e to CST ELA 


L 






Math 




G rade 


Below 

Basic 


Basic 


Proficient 


Advanced 


Below 

Basic 


Basic 


Proficient 


Advanced 


Below 

Basic 


Basic 


Proficient 


Advanced 


2 


149 (2) 


155(9) 


175 (43) 


191 (78) 


156 (2) 




179 (48) 


I 


153 (2) 


158 (3) 


170 (24) 


180 (62) 


3 


162 (8) 


178 (23) 


194 (59) 


205 (86) 


166 (7) 


182 (24) 


197 (61) 




162 (8) 


177(15) 


190 (49) 


203 (87) 


4 


166 (4) 


184 (17) 


201 (53) 


211 (81) 


169 (4) 


189 (18) 


204 (55) 


1 1. m 


176 (4) 


193 (25) 


203 (57) 


212 (84) 


5 


179(6) 


194 (20) 


210 (59) 




184 (6) 


198 (21) 


212 (60) 




191 (9) 




215 (68) 


227 (92) 


6 


185 (6) 


199 (20) 


214 (56) 


225 (85) 


190 (6) 


203 (21) 


217 (61) 


liliB 


189 (5) 


207 (28) 


223 (70) 


238 (94) 


7 


190 (6) 


204 (16) 


218 (56) 


231 (89) 


195(7) 


207 (23) 


220 (61) 


1 It E ■ 


197 (8) 


215 (35) 


233 (77) 


250 (97) 


8 


196 (8) 


210 (20) 


224 (62) 


236 (90) 




212 (27) 


224 (64) 


H 




221 (35) 


236 (61) 


257 (96) 



Table 17 

Projected Minimum RIT Scores for SPRING that are Equivalent to Performance Levels on CST 
(scores under the below basic cut score project to far below basic 
NW EA percentile rank is in parenthses) 



G rade 


Below 

Basic 


Reading to C ST ELA 






Below 

Basic 


Languag 


e to CST ELA 




Below 

Basic 






Math 






Basic 


Proficient 


Advanced 


Basic 


Proficient 


Advanced 


Basic 


Proficient 


Advanced 


2 


159(7) 








(83) 


Kmm 




193 (59) 


205 (86) 


162 (2) 


173 


(ID 


185 (39) 


196 


(74) 


3 




188 (25) 


203 (61) 


214 


(88) 


r g ■ 


191 (26) 


206 (66) 


217 (91) 


173(3) 


190 


(22) 


202 (56) 


215 


(90) 


4 


IMIHB 


191 (18) 


208 (56) 


218 


(82) 


jpjl ■ 


192 (16) 


210 (59) 


220 (86) 


180 (3) 


198 


(21) 


212 (59) 


225 


(89) 


5 


185(6) 


200 (22) 


217 (65) 


228 


(90) 


II ■ 


201 (19) 


218 (65) 


228 (92) 


194 (8) 


211 


(36) 


224 (69) 


245 


(97) 


6 




204 (20) 


220 (60) 


230 


(86) 


||ffl ■ 


205 (20) 


221 (54) 


229 (80) 


189 (3) 


214 


(32) 


231 (71) 


252 


(96) 


7 


193(5) 


208 (21) 


225 (64) 


235 


(89) 


illcl 1 


209 (22) 


225 (68) 


234 (86) 


200 (7) 


221 


(35) 


238 (71) 


257 


(95) 


8 




214 (24) 


230 (67) 


240 


(91) 




214 (26) 


230 (75) 


237 (91) 


208 (10) 


227 


(35) 


244 (67) 


264 


(95) 
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Comparing California proficiency standards with the estimated standards reported in other state test 
alignment studies 

Northwest Evaluation Association tests have been aligned with the cut scores for the state proficiency test in 16 
states. Togetan estimateof thedifficulty of theCaliforniastandardsin relation to other state tests, we 
evaluated the standard used as the cut score for NCLB reporting or the proficient performance level and 
compared it to the cut score representing the same standard in these other states. Although the number of states 
studied is rapidly increasing, the states studied may not reflect what is typical in regard to these kinds of 
standards. 

The results are summarized in Tablel8. Cal ifornia's cut scores i n both reading and mathematics are well above 
the NWEA’s national median scores in both reading and mathematics and rank among the most challenging of 
the state standards studied, generally requiring that students perform between the 55 th and 70 th percentile (with 
the notable exceptions of grade 2 and grade 10. We'd recommend caution about drawing any judgments about 
the quality of Cal ifornia’s standards from this information alone. States establish standards for different 
purposes. Somestates, Washington might bean example, set standards at a level they believe appropriate for 
students pursuing post-secondary education. Others may set standards at a lower level that reflects the literacy 
needed to be successful in the workplace. The No Child Left Behind Act requires schools to set targets that 
would result in all students achieving a proficient level of performance in about 11 years. While a few 
communitiesin California are no doubt close to achieving this already, many will have to improve the 
performance of large proportions of their students to reach this challenging goal. Our point is that standards 
should bejudged on how well they align with the purposes the community originally wanted to reflect, not 
purely on how high or low the "bar" is set. The primary thing the tables make clear is that proficiency standards 
vary widely from state to state and that proficiency is not yet a concept that h as a shared definition. 

Relative difficulty of the mathematics and ELA standards 

Educators may assume that state standard setting processes are designed to produce standards across subjects 
thatareequal in difficulty. Our previous studies show that this is not always the case. Arizona’s math standards, 
for example, have been considerably more challenging than their standards for reading, although the state is 
taking steps to bring closer alignment between the two subjects. In general, Cal ifornia's standards for M ath and 
English/LanguageArtsaresimilarto each other in difficulty. 
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Table 18 - Cut scores representing proficient or "meets standards" level of performance on 16 state assessments 
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Mathematics 
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• Indiana tests students in the fall. Their cut scores were adjusted to reflect equivalent spring performance 
• Colorado uses the partially proficient level of performance for N C LB reporting. To maintain consistency we report the level each state uses for N C LB reporting here. 

• The Texas estimate is based on the level for proficient performance that will be implemented in 2005. 
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Calibration of the California Standards Across G rades 



When we say a standard should be calibrated across grades, wemean thatastandard havethesame 
difficulty at every grade level. Standards for grade 8 should not be considerably easier or more difficult than 
the standards for grade 3. H ere are the reasons we take this position : 

• If standards are used to evaluate the effectiveness of teacher or school performance, equity requires 
that the standards be the same for all. It is simply unfair to hold some teachers and students to a 
higher standard than others simply because they work at different grade levels. From apractical 
point of view, teachers will be reluctant to accept teaching assignments at a grade level if it becomes 
known that the standards associated with that grade level are considerably more difficult to achieve 
than those imposed at othm grades. If you doubt us, call any Arizona middle school principal and 
ask if it has been easier to fill 6f h or 8 th grade math positions in the last couple of years. 

• If standards are used to tell teachers and students whether students are on-track to meet 
community expectations, it's important that proficiency at third grade truly projects to proficiency 
at eighth grade, assuming proficient children achieve normal growth. When this is not the case, 
teachers, students, and their parents receive an inaccurate message about the true performance of 
their children. In other words, if the third grade standard is considerably easier than the eighth 
grade standard, reports will teil somethird gradefamilies that their children are proficient, when, 
in fact, their performance is very likely to fail short of proficiency in the future. 

There are significant issues relative to the calibration of standards within the California State Tests. The 
most significant problem is that the standards for performance in the upper grades (grades 6, 7, and 8) are 
substantively higher than they are at the younger grades (grades 2, 3, and 4). Let’s use mathematics to 
illustrate the problem. 

Figure 2 (see foil owing page) shows the percentile score associated with proficiency on the spring N WEA 
mathematics test. It shows that the percentile score required for passing the test at grades 2 through 4 is 
much lower than the near 7ff h percentile score required to pass the test at grades 6, 7, and 8. Were these 
patterns to hold up over time, about 13% of the total testing population identified as proficient in 3 rd grade 
would fail to meet the standard in 8 th for no reason other than lack of calibration in the standard. 

Figure 3 is a line graph that compares the R IT score that actually meets the standard each grade with the 
score that would be required at every grade for a student to be on-track to meet the 8T grade standard. The 
figure shows that the score currently required by the standard ranges from 3 to 9 points less than the 
projected 8T grade cut score in grades 2, 3, and 4. While these differences do not immediately seem large, 
when applied over an entire state they result in thousands of students being identified as proficient in grades 
2, 3, and 4 who will grow normally and not achieve proficiency at grade 8. This can result in the delay of 
needed interventions for these students and can wreak havoc on the stability of adequate yearly progress 
statistics. 
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Figure 2 - NW EA spring percentile score projecting to proficient level of performance on CST in 

mathematics 
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Figure 3 - RIT score projected to achieve proficient score on one grade's CST vs. RIT score required 
to project to achieve a proficient score on the 8 th grade CST 
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— a— Performance required to be "on track" for the 8th grade standard 
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Using RIT scores to estimate student probability of achieving passing performance on the CST 

Helping students pass thestate test is not the primary reason our members use NWEA assessments. We 
hope they are used to provide teachers information that will allow them to improve the learning of all 
students. Nevertheless, state test results are important and failing to do well on them can have deleterious 
effectson students and their schools. Because of this, we believe educators would benefitfrom knowing 
more about the probability that a student’s RIT score would lead to a passing score on the CST. This 
would allow educators to more reliably identify students who will need additional resources to reach this 
level of performance. Equally important, however, it will allow educators to know which students are "safe" 
against California standards so they can focus their time with these students on providing new challenges 
that better suit their current needs. 

Tables 19 through 24 on the following pages, and the accompanying graphs show the proportion of 
students at each RIT level who earned scores at or above the proficient level on the CST assessments. Using 
Table 19 as an example, we find that about 12% of the 5 th grade students who achieved a reading RIT score 
between 205 and 209 went on to achieve a passing score on the California ELA assessment. A 5 th grade 
teacher with ten students performing in this range would know that only about one in ten of these students 
will be proficient on theCST unless they work hardm, receive more focused instruction, or have access to 
additional resources. 

On the other hand, about 92% of 5 th grade students performing at 225 to 229 level achieved proficiency on 
theELA assessment. Teachers should feel free to focustheir efforts with these students on new and more 
difficult challenges than the basic fifth grade standards might provide. 
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Table 19 

Proportion of students achieving proficient performance level on the CST English/ Language Arts 
assessment based on PRIO R FALL RIT score - Reading 



RIT Score 
TCUI 



G rade 2 



4.35% 



4.67% 



7.32% 



13.51% 



21.59% 



29.02% 



39.34% 



52.78% 



66.32% 



79.00% 



90.49% 



95.92% 



98.50% 



100 . 00 % 



Grade 3 | Grade 4 Grades Grade 6 | Grade 7 



G rade 8 




0.33% 



2.03% 



2.25% 



4.23% 



7.94% 



12.84% 



23.18% 



42.46% 



65.23% 



83.98% 



93.80% 



98.51% 



99.38% 



98.28% 



100 . 00 % 



2.16% 



1.08% 



0.82% 



3.17% 



7.77% 



14.01% 



31.37% 



55.79% 



79.18% 



90.77% 



97.77% 



99.76% 



100 . 00 % 



0.59% 



0.94% 



2 . 11 % 



5.93% 



15.88% 



34.83% 



58.44% 



81.40% 



94.04% 



98.17% 



99.45% 



100 . 00 % 




0.83% 



1.40% 



5.88% 



17.73% 



40.91% 



68.37% 



87.01% 



96.23% 



99.33% 



100 . 00 % 




2.04% 



1.47% 



1.06% 



2.53% 



6.28% 



18.71% 



39.19% 



68.57% 



88.77% 



96.95% 



99.76% 



100 . 00 % 



1.78% 



0.61% 



1.48% 



1.60% 



6.33% 



12 . 02 % 



35.48% 



62.88% 



84.45% 



96.08% 



98.23% 



99.04% 



100 . 00 % 



Table 20 

Proportion of students achieving proficient performance level on the CST English/ Language Arts 
assessment based on same SPRING RIT score - Reading 
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Percent of Students Proficient - Spring Percent of Students Proficient - Fall 



Figure 4 - Proportion of students achieving proficient performance level on the CST 
English/ Language Arts assessment based on PRIO R FALL RIT score ■ Reading 
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Figure 5 - Proportion of students achieving proficient performance level on the CST 
English/ Language Arts assessment based on same SPRING RIT score - Reading 
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Table 21 

Proportion of students achieving proficient performance level on the CST English/ Language Arts 
assessment based on PRIO R FALL RIT score - Language Usage 




Table 22 

Proportion of students achieving proficient performance level on the CST English/ Language Arts 
assessment based on same SPRING RIT score - Language Usage 
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Percent of proficient students - EU 



Figure 6 

Proportion of students achieving proficient performance level on the CST English/ Language Arts 
assessment based on PRIO R FALL RIT score - Language Usage 




RIT Score 



Figure 7 

Proportion of students achieving proficient performance level on the CST English/ Language Arts 
assessment based on same SPRING RIT score - Language Usage 
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RIT Score 
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Table 23 

Proportion of students achieving proficient performance level on the CST mathematics assessment 
based on PRI0 R FALL RIT score - Mathematics 




Table 24 
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Percent of Students Proficient - Mathematic 



Proportion of students achieving proficient performance level on the CST mathematics assessment 
based on same SPRING RIT score - Mathematics 
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Figure 8 

Proportion of students achieving proficient performance level on the CST mathematics assessment 
based on PRIO R FALL RIT score - Mathematics 
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Percent of Students Proficient - Mathematics 



Figure 9 

Proportion of students achieving proficient performance level on the CST mathematics assessment 
based on same SPRING RIT score - Mathematics 
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Using RIT scores and data from this alignment study to set individual growth targets 

NWEA encourages educators and parents to collaborate on setting individual growth targets for students 
based on what we call a "hybrid-growth model". The proficient standard cut score for each grade reflect 
benchmarks that students who are "on-target" would meet if they were to achieve the state's benchmark for 
the No Chi Id Left Behind Act. For students who are behind this benchmark, we recommend a growth target 
that would reflect the norm for their grade and RIT range (see the 2002 NWEA norms study for this 
information) plus some proportion of the gap between their current performance and the benchmark that 
the student would try to close during this school year. For those students whose performance is ahead of 
the benchmark, we suggest a target that reflects their current RIT range norm. 

This approach assures that each student has a growth target that is challenging. It also assures that low 
performing students have targets that will assure they eventually reach proficiency standards. Schoolsthat 
achieve high rates of success on these kinds of targets will assure that no child is left behind (to borrow a 
phrase) whilealso making sure that all children have the opportunity to get ahead, regardless of where they 
stand against a standard. M ore information on this approach can be obtained by contacting the Research 
team at NWEA. 

Summary and Conclusions 

This study investigated the relationship between the scales used for theCST assessments and the RIT scales 
used to report performance on Northwest Evaluation Association tests. The study determined RIT score 
equivalents for theCST performance levels in English/Language Arts and mathematics. Test recordsfor 
more than 73,000 students were included in this study. 

Three methods generated an estimate of RIT cut scores that could be used to project CST performance 
levels. Rasch SOS and second-order regression methods generally produced the most accurate projections 
of cut scores. Accuracy of predicting proficient performance on theCST from spring NWEA assessments 
was above 83% for all grades and above 82% for all grades when fall N W EA scores were used. 

Readers should exercise some caution about generalizing these results to theirown settings. Curricular or 
instructional differences unique to your districts may influence the accuracy with which the estimated cut 
scores reflect actual performance in your setting. With this limitation in mind, we would encourage 
educators to use this data as one tool to inform standards-based decisions. 

The information gathered in this study came from measures employing the NWEA RIT Scale. Because all 
of the research that we have to date indicates that scores generated from computer-based tests and 
Achievement Level Test(ALT) scores are virtually interchangeable, readers should feel comfortable applying 
the results of this study in any setting that uses the RIT scale. 

We hope that data from this study provides useful information to help California educators use NWEA 
assessments to better inform, plan and deliver student instruction. Good information, when matched with 
the professionalism and commitmentof our colleagues, will assure that every student has the opportunity 
to reach their aspirations. 
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